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ABSTRACT 



In constructing a general purpose programming language, a key issue is 
providing a sufficient set of data types and associated operations in 
a manner that permits both natural problem-oriented notation and very 
efficient implementation. The language ELI contains a number of features 
specifically designed to simultaneously satisfy both requirements. The 
resulting treatment of data types includes provision for programmer- 
defined data types and generic routines, programmer control over type 
conversion, and very flexible data type behavior, in a context that 
allows efficient compiled code and very compact data representation. 
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THE TREATMENT OF DATA TYPES IN ELI* 

Section 1: Introduction 

The prime function of a problem-oriented language is to provide a set of 
data types and associated operations sufficient to represent the unit objects 
and operations of its problem domain. This representation must on the one 
hand be very natural to the programmer and on the other be implementable on 
computing machines in a very efficient fashion. The success of FORTRAN, 
SNOBOL, and COBOL is due principally to their respectively providing such 
representation for scalars and arrays of numbers, strings, and data process- 
ing records. Each language has an envelope of applications in which program 
creation is natural and program execution is efficient. This envelope is 
determined primarily by the set of data types and operations it provides. 

In recent years there has been considerable effort to construct languages 
with significantly larger performance envelopes [l, 2, 3, 4, 5] , that is, 
languages to serve many or all problem domains. In constructing such 
languages, the principal problem is providing a sufficient set of data types and 
associated operations in a manner that affords both naturalness of notation and 
efficiency of representation. We wish to stress that both considerations are 
absolutely vital. Either alone can be satisfied fairly easily. However, simul - 
taneously achieving very efficient representation and natural notation for a wide 
variety of data types is a quite difficult matter. We stress the importance 
of both considerations because the really significant problems of 
computer science generally entail difficult programming projects where 
considerable effort is required for program creation and development and 



* Also included as number 4-71 in the publication series of the Harvard 
Center for Research in Computing Technology. 



where the final result must be a very efficient running program. 

The purpose of this paper is to discuss how this problem is solved in the 
programming language ELI. ELI is a working programming language' 
currently under further development as part of a research project in extensible 
languages at Harvard University. It has a number of features specifically 
designed to make possible a very flexible, yet very efficient, treatment of 
data types. These features, their shaping of data type handling, and their 
interaction with other aspects of the language are the topics of this paper. 

This paper is divided into nine sections of which this is the first. Section 2 
is a brief sketch of the language ELI, outlining its main features and establish- 
ing the notation to be used in the rest of the paper. ELI gives a somewhat un- 
conventional treatment to the union of data types; since this concept arises in 
several contexts, it is examined in detail in section 3. Section 4 describes the 
basic data type definition facilities of ELI. Section 5 discusses the evaluation 
of data type definitions and the implications of this to compilation. Section 6 
discusses the treatment of generic routines (roughly, routines whose action 
depends on the types of their arguments). Section 7 discusses type conversion 
and its interaction with generic routines. Section 8 deals with the more sophisti- 
cated aspects of the ELI data type definition facility: the mechanisms which 
allow the programmer detailed control over data type behaviors. Section 9 turns 
from the specific to the general— abstracting the techniques used in ELI and ex- 
amining to what extent they can be applied to other problem-oriented languages. 

Section 2: A Brief Sketch of ELI 

In written appearance, ELI is a fairly conventional programming language 
in the Algol 60 tradition. It includes variables and subscripted variables, 



'The present version of ELI runs on the PDP-10 under the 10/50 monitor. 
A version for TENEX PDP-10 is imminent. Versions for other machines 
are contemplated. 



prefix and infix operations, labeled statements and gotos, block structure, pro- 
cedure calls, and assignments, all written in standard fashion. Many standard 
forms are somewhat generalized in ELI. For example, assignment is treated 
as a binary operator whose value is its left-hand operand. Also, blocks have 
values — the value of the last statement executed. Hence, 

X - BEGIN B[J-J+l]-COS(W); FUM(B[J],Y) END 

adds one to J, then assigns COS(W) to B[J], then applies FUM to B[J] and Y, and 
finally assigns the result of FUM to X. Conditionals are specific types of state- 
ments in blocks. An if-then statement is written with a right-pointing arrow. 
That is, the Algol 60 construct "if & then £ " is written in ELI as 

&+ £ 
For example, in 

BEGIN 

I - 0; 
L: A[M+1]- 0; 
I<N - GOTO L 

END 

the loop is repeated until I reaches N. There is a second form of conditional 
statement, written with a double -shafted arrow, interpreted as: if the left-hand 
side is true, execute the right-hand side and exit the block with that value. 

Hence, the Lisp conditional (COND ( ^ £ ±) (^ 2 ^2* * * * ^n^n* * is written 
in ELI as 

BEGIN 

^2 => £ 2 ; 



& => £ 
n ** n 



END 



Simple conditionals, block-exit conditionals and unconditional statements can be 
freely intermixed. For example, the following block computes an approximate 



square root of a number A with initial approximation X to within EPS 

BEGIN 

L: ABS(X**2-A)<EPS => X; 
X - (X+A/X)/2; 
GOTO L 

END 

The block is exited only when the left-hand side of the first statement is true; 
when the exit is taken, the value of the block is X. 

Variables are either formal parameters to a routine or variables local to 
a block. In either case, a variable is declared to be of some specific data type 
and is restricted to contain values of that type throughout its lifetime. 

Data types, termed " modes ' 1 in ELI, include the following built-in types: 
BOOL (Boolean), CHAR (character), INT (fixed point), REAL (floating point), 
REF (pointer unrestricted as to the mode of the object it can point to), 
SYMBOL (corresponding to non-numeric atoms in Lisp), MODE (the data type 
"data type 1 '), FORM (the Lisp S-expression), and ROUTINE (procedure or 
operator). From the standpoint of creation, assignment, and use as arguments 
or formal parameters, all these modes are equally valid. Hence: 

DECL I,J: INT; 
creates integer-valued variables named I and J, while 

DECL Ml, M2, COMPLEX: MODE; 
declares three mode-valued variables, and 

DECL Fl, FOO, FUM, CSIGN: ROUTINE; 

creates four routine-valued variables. While all these modes are equally valid, 
they vary considerably in complexity. For example, a BOOL value is a single 
bit while a MODE value has associated with it all the information needed by the 
language to implement a data type. However, from the standpoint of the program- 
mer, the complexity is largely invisible. He is concerned only with the behavior 



of values having these modes: a BOOL value can be used in the left arm of a 
conditional while a MODE value can be used in declaring the type of a variable. 

Objects are distinct from variables in ELI. Variables may name objects, 
but the mapping is not one-to-one. That is, while each variable names some 
object, several variables may name the same object (e.g., when an argument 
is passed by reference to a routine), several variables may name different 
parts of a single object, and an object may be named by no variable. An object 
lies either on a block-structured stack (like that of Algol 60) or in the free 
storage region termed the heap (like that of Lisp or Algol 68). In the former 
case, the lifetime of an object is concomitant with that of the block in which it 
was created. In the latter case, an object remains until no variable names it 
and no pointers reference it. Garbage collection periodically reclaims objects 
in the heap no longer in use and returns them to the free storage pool. 

An object has a mode determined at the time of object creation. An object 
is created in one of two ways, either implicitly as the result of a declaration, 
or explicitly by means of the generators CONST and ALLOC. Objects created 
implicitly by declaration reside on the stack . Objects created by the explicit 
generator CONST also reside on the stack ; objects created by the explicit gener- 
ator ALLOC reside in the heap . As an example, suppose that the mode COMPLEX 
has been defined (a definition in the language will be given in section 3), then 
consider 

CONST(COMPLEXOF X, Y) 
ALLOC(COMPLEX OF X, Y) 

The first line constructs a complex number on the stack and returns this com- 
plex as its value; the second line constructs a complex number in the heap and 
returns a pointer to the complex number as its value. As an example of how the 
latter value may be used, consider 
DECL P:REF; 

P - ALLOC(COMPLEX OF 3., 4.); 



The first line creates a pointer-valued variable P unrestricted to the sort of 
object it can point to; the second line assigns to P a pointer to the complex 
number 3+4i. Given a pointer such as P, the object pointed to can be accessed 
by applying the function VAL, e.g., 

VAL(P) 

is the complex number 3+4i. The only means for creating a new pointer value 
is ALLOC. Hence, pointers point only to the heap, never to the stack. 

In ELI, the notion of a routine embraces both procedures and operators. 
A routine-valued variable may be assigned a routine-value, e.g., 

CSIGN - EXPR (X:REAL; CHAR) 
BEGIN 

X > => ' P; 
X < => ' N; 
'Z 

END 

Here the routine has a single parameter' named X of mode REAL, delivers a 
CHAR value and has a body consisting of a block which computes the sign of its 
argument and yields the character P, the character N or the character Z. Any 
routine may be written as a function and applied to its arguments 

CSIGN(A[J]) 

In addition, a routine-valued variable can be declared to be a prefix operator 
and then applied to a single argument without enclosing the operand in 
parentheses. A routine taking two arguments can be declared as an infix 
operation and used accordingly. The standard operators such as +, *, -, /, >, <, 
and others are defined in this way as part of an initial, system-provided 



^The formal parameter X is bound by reference (in the sense of Fortran or 
PL/I), i.e., an assignment to X would change the value of the argument. Where 
this is not desired, one can declare that a formal parameter is bound by value 
(in the sense of Algol 60), in which case a private copy of the value of the argu- 
ment is made. Assignments to the formal will then affect only the formal and 
will not change the value of the actual argument. 



extension set. 

The principal concern of this paper is with mode-valued constants, mode- 
valued variables, and mode-valued routines. The primitive data types (BOOL, 
CHAR, INT, REAL, and REF) mentioned previously are examples of mode- 
valued constants. To make the notion clear, consider 

DECL P: BOOL; 

P - TRUE; 

This creates a Boolean valued variable P and later assigns it the Boolean value 
true (denoted by the Boolean constant TRUE). Analogously, consider 

DECL Ml: MODE; 

Ml - INT; 

This creates a mode-valued variable Ml and later assigns it the mode-value 
integer (denoted by the mode constant INT). In addition to the mode constants 
mentioned earlier, there are two others: NONE and ANY. The former is the 
mode of the empty object. The latter is the union of all possible modes; 
section 3 discusses this and other mode unions. 

A mode-valued variable may be used in any position where a mode value is 
required. For example, suppose that after the above assignment has been exe- 
cuted the following block is entered 

BEGIN 

DECL J: Ml; 

END 

In this block, J is an integer valued variable. If Ml had some other mode value, 
say ^ s then J would be an ^-valued variable. 

The concept of mode-valued routine is a logical consequence of treating 
modes as values. The simplest such routine is MD which takes a single 
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argument and delivers its mode . For example, MD(FALSE)=BOOL and 
MD(SIN(X))=REAL . A more significant application of mode-valued routines is 
mode construction. That is, a set of primitive mode-valued routines provides 
the means for constructing new modes. These primitive mode constructors 
take modes as arguments and define new modes in terms of these. From the 
primitive mode constructors, the programmer can define other mode- 
constructing routines by means of functional composition, iteration, con- 
ditionals, and recursion. Mode constructors, primitive and programmer- 
defined, are discussed in sections 4, 5, and 8. 

The above sketch of ELI treats those facets of the language required for 
the purposes of this paper. (A more complete explanation of the language and 
a formal definition of both its syntax and semantics is found in [6].) However, 
the traditional discussion of a language per se neglects many facets of its 
usage and implementation. For ELI those considerations are particularly im- 
portant, and an understanding of several "extralingual" facets is essential to an 
understanding of this paper, ELI is the language component of a programming 
system called ECL. T The system is used on-line with two fully compatible 
language processors — an interpreter and a compiler . ' ' Compiled and interpre- 
ted routines may be freely intermixed with no restrictions. 

One key point of the ECL system is that there is no rigid "compile time/' 
M load time ," "run time" distinction. Routines are interpreted until explicitly 



'The system includes the usual facilities for on-line interaction such as a text 
editor, a trace feature, and a debugging package. It also includes multitasking, 
multitasking control primitives, and programmer-controlled interrupt process- 
ing. An overview of the entire system is given in [7] . 

** Although there is an interpreter and compilation is optional, the language has 
been strongly shaped by the expectation that production programs will be eventu-^ 
ally compiled. For example, an interpretable-only language could be M type-less M 
with all variables free to take on values of any type. However, efficiency con- 
siderations lead one to a compiler and with it typed variables. Most of the de- 
clarative data type information is really of interest only to the compiler. How- 
ever, to maintain compatibility between language processors, the interpreter 
verifies that the data type constraints are satisfied. 



compiled. Compilation is carried out by calling the compiler as a subroutine 
and passing as argument the routine to be compiled. The compiler can be 
called at any point, e.g., while executing a routine. Hence, it is possible to 
compile a routine several times with very precise control over the degree of 
"binding. 11 For each compilation of a routine, one can compute certain invari- 
ants of that compilation instance and then compile code which reflects these 
invariants. That is, suppose P is a routine with free variables I- . . . I . 

Suppose some k of these I. . . . L are bound to specific values V. . . . V. and 

h 3 k lk 

the result is compiled. The code generated will be better, often substantially 
better, than the code for P had all variables been left free. 

The ELI compiler is called with two arguments — a procedure P and a list 
L of variables free in P which are to be so bound. For each variable I on L, 
each appearance of I in P is replaced by its value at the time of compilation. 
Such a variable is said to be frozen . If a routine identifier, the free variables 
of the routine, and the arguments to the routine are all frozen (or are otherwise 
constant) then the routine is evaluated during compilation and the value thus pro- 
duced takes the place of the call. Applying this rule recursively, arbitrarily 
large amounts of a routine being compiled may collapse into values, i.e., 
become frozen . For example, if FOO, FIE, X and Y are all on L, then 

FOO(3, Y, FIE(X,Y, 3.2)) 

may be so frozen. In general, any syntactic unit made up only of constants or 
frozen forms is said to be evaluatable ; it is replaced by its value and becomes 
frozen. Hence, depending on L, compilation may leave none or all free vari- 
ables in P and generate very tight or very loose bindings. 

Section 3: Mode Union 

The concept of mode union is treated rather specially in ELI. As an ex- 
ample of this treatment, recall that the mode constant ANY denotes the "union 11 



10 

of all modes. Consider an assignment to the routine-valued variable F 

F - EXPR (X:ANY; BOOL) BEGIN . . . END; 

F then takes as argument a value of any mode, e.g., 

F(3), F(3.), F("W n ), F(TRUE), F(REAL) 

are all legal calls on F. In each case, the X of this invocation of F is bound 
to the argument. The critical point of the ELI treatment of unions is as 
follows: in each case the X of this invocation of F takes on the mode of the 
argument and henceforth cannot change throughout its lifetime. Hence, in the 
first call on F, X is bound to 3 and becomes an integer. The value of X can be 
changed by an assignment, e.g., 

X-4; 

but the mode of X is fixed. ' 

In section 4 we discuss modes which act as "restricted" unions, that is, 
mode, or mode« or ... or mode . There, as here, a formal parameter de- 
clared to have such a mode is bound at the time of call to some specific 
alternative from the set of possibilities; just which alternative is determined 
by the argument. Subsequent to creation, the formal parameter cannot change 
from that alternative mode. For example, a negation routine may be defined to 
have a single parameter whose formal mode is int or real or complex. The 
mode of the argument determines which one of these is the actual mode. 



'An analogy may be made with the length of an Algol 60 array. Consider, for 
example, the Algol 60 fragment 

begin real array A[l:N] ; 

end 

A is declared, to be an array of length N. For each instance of the block (i.e., 
block activation), the length of A is fixed to the then current value of N at the 
time of block entry. Subsequently, the value of A may change but not its length. 
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The ELI union differs from a set theoretic union in exactly one respect: 
after an object is created, its mode is fixed to be a specific alternative. A set 
theoretic union would allow objects whose modes as well as values could be 
changed throughout their lifetime.^ Clearly, the ELI union is a subcase of the 
more general set theoretic union. This restriction is imposed for two reasons: 
implementation efficiency and linguistic simplicity. By requiring objects to 
assume some definite (i.e., non-united) mode, there is never the need to allo- 
cate extra stack storage to provide for the contingency of a variable changing 
its mode and thereby assuming a larger size. Among other consequences, this 
makes possible a stack implementation of the mode ANY; (this, of course, 
would not be possible for a set theoretic ANY). Since union is treated as post- 
ponement of a mode choice, the concept of union does not exist for the evalu- 

ator; i.e., each object has a definite unchanging mode. Hence, there is no need 

tt 
for special semantic rules to deal with unions. ' ' This simplifies both the 

learning and use of the language. Finally, the treatment of union in ELI inte- 
grates smoothly with the use of generic routines; this is discussed in section 6. 

Section 4: Mode Construction 

Given sets t-, . . . , t , one can form new sets from these in several ways: 

(1) Cartesian product: t, X t X . . . X t 

l A n 

(2) self product — definite , k °° k 

and indefinite: i and U t. 

k=0 x 

(3) union: tjU t 2 U...Ut n 

(4) mappings: (tj X . . . X t R ) - t n+1 



^This is, for example, the treatment of mode union given in [8]. 

ft As an example of the sort of issue we thereby avoid, consider the following. 
Let A be a true set theoretic union of int and bool and let its current value be 
an int. Suppose A is passed by reference (in the sense of PL/I) to a routine 
F which takes an int formal parameter named X. Since A currently has an int 
value,, presumably this is legal. What if F uses A free and assigns a bool value 
to it?' Does this affect X? 
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These have the natural interpretations: structures (in the sense of COBOL or 
PL/I), arrays, unions, and routines. These four formation rules define 
classes of modes and, by the usual abuse of notation, four classes of objects 
belonging to these modes. Corresponding to each formation rule there is a 
primitive ELI routine -STRUCT, ARRAY, ONEOF, PROC - which generates 
new modes of that class. The basic mode definition mechanism of ELI is the 
set of primitive mode-valued constants and the set of primitive mode-valued 
routines. All other modes are generated from these. 

STRUCT takes as arguments a list of pairs (name.: mode.), where mode, 
is the mode of the i component and name, is the symbolic name. For example,' 

LIGHT_BULB - STRUCT(HOURS_USED:REAL, 

WATTS:INT, 
COLOR: CHAR, 
BURNT_OUT:BOOL) 

defines a mode of class structure consisting of four fields: a real, an integer, 
a character, and a Boolean named HOURS_USED, WATTS, COLOR, and 
BURNT_OUT, respectively. The mode thus defined is assigned to the mode- 
valued variable LIGHT_BULB. Subsequent to the assignment, the variable may 
be used as a type declarer 

DECL X, Y, Z : LIGHT_BULB; 

creating variables X, Y, and Z of mode LIGHT_BULB, and three associated 
objects (in the stack) named by the identifiers. The individual components can 
be referred to by qualified naming (in the style of PL/I) so that 

Z. COLOR 



'The promised definition of the data type complex is: 

COMPLEX - STRUCT(RE: REAL, IM: REAL) 

If Z is a complex variable, then Z.RE and Z.IM denote its two REAL com- 
ponents. 
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is a character field. Alternatively, a component can be selected by an integer 
subscript so that 

Z[J] 

is identical to Z. COLOR if and only if J has the value 3. Assignment of one 
LIGHT_BULB to another is denoted in the usual fashion 

X- Z 

and copies all components of the structure. 

ARRAY generates either definite or indefinite self product, depending on 
how it is called. 

ARRAY(K,^) 

generates the mode *M while 

ARRAYS) 

generates the mode U *Jt . In the latter case, the mode is said to be length 

k=0 
unresolved . While the mode is length unresolved, any particular instance of 

such a mode has a fixed length determined at the time the instance is created. 

For example, 

CARD <- ARRAY(80, CHAR); 

defines the mode n array of 80 characters" and assigns it to the mode-valued 
variable CARD. Any variable of mode CARD 

DECL C: CARD; 
has exactly 80 components which may be accessed by subscripting 

C[I] 
The mode "length unresolved array of characters" may be defined by 

STRING - ARRAY(CHAR); 



14 

This creates a mode whose instances may be of any length. The length of each 
instance is, however, fixed at the time of creation 

DECL S: STRING BYVAL CONST(STRING SIZE 200); 

This creates a variable S of mode STRING and initializes it to a STRING of 200 
components. Subsequently, the values of S's components may change but not the 
number of components. The number of components in an array may be deter- 
mined by applying the primitive routine LENGTH, e.g., LENGTH(S)=200. As 
with structures, assignment of arrays is written using the assignment oper- 
ator and copies all components. 

ONEOF (t-jto, ... ,t ) defines a "union 11 of n alternative modes t i ... t , 
1 2 ' n In 

where "union" is used in the sense described in section 3. That is, a variable 
declared to be of such a mode takes on some specific alternative determined 
by its initial value. For example, 

ARITH - ONEOF(INT,REAL); 

SIGN - EXPR(X : ARITH; ARITH) BEGIN . . . END 

SIGN(-13) 

DECL Y: ARITH BYVAL P(X) 

In the second line, the routine is declared to take a single argument which is 
either INT or REAL. In the call to SIGN, in line 3 an INT is used so that X in 
this invocation of SIGN is an INT. In the fourth line, Y is declared to be either 
an INT or a REAL — which one is determined by the mode of P(X) on each exe- 
cution of this line. 

PROC (t ... , t ; t -) defines the mode "mapping from t X...Xt into 
t +1 ". For example, 

TRIG - PROC(REAL; REAL) 

defines the mode "set of routines which map reals into reals," while 

CODE - PROC(CHAR;INT) 
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defines the mode of routines which convert characters into integers. 

In addition to the four classes of modes described above, there is a fifth 

class which arises from other than set theoretic considerations: the class 

pointer . PTR(t) is the mode M pointers restricted to point to objects of mode t ' 

and PTR(t- , . . . , t ) is the mode M pointers restricted to point to t 1 *s or t 's or 

... or t ' s n . Here, unlike the situation with ELI unions, no commitment is 
n 

made when such a pointer is created. Such a variable may first point to a t- , 

later to a t , and still later to a t n . For example, 

n 2 r 

DECL SP: PTR(INT, REAL, COMPLEX); 

creates a variable SP whose mode is "pointer to INT or REAL or COMPLEX." 
Like all pointers, SP is given the default initial value NIL, meaning a pointer 
to nothing. Assignments to SP may change this value 

SP - ALLOC(COMPLEX OF 3., 4.) 

so that SP points to a complex number whose value is 3+4L 

Section 5: The Evaluation of Mode Definitions 

All the primitive mode generators share one common trait — they evaluate 
their arguments and these arguments may be any syntactic form which yields an 
appropriately typed value. This trivially leads to multidimensional arrays such 
as 

REAL_MATRIX <- ARRAY(ARRAY(REAL)); 

multilevel structures, arrays of structures such as 

ARRAY(4, STRUCT(RE: REAL, IM : REAL)) 
and structures of arrays and pointers such as 

STRUCT(A: INT, B: PTR(INT), C: ARRAY(INT) ) 
This illustrates only one case of evaluated arguments to mode generators. 
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Mode-valued variables, conditionals, and other routines are equally acceptable. 
For example, 

ARRAY(N**2, F(X) ) 

defines the mode: "array of N F(X)'s" where N and F(X) are determined at the 
point that ARRAY is called. Turning to a more complex example, the following 
loop computes the mode: 'complete binary tree of depth N whose terminal nodes 
are integers" 

BEGIN 

DECL TEMP : MODE BYVAL INT; 

DECL I : INT BYVAL N; 
L: (I <- 1-1) < => TEMP; 

TEMP <- STRUCT(L: TEMP, R: TEMP) ; 

GOTO L 
END 

The declaration creates a local variable TEMP of type MODE and initializes it to the 
value integer . The loop assigns to TEMP successive elements from the sequence 

STRUCT(L: INT, R: INT) , 

STRUCT(L: STRUCT(L: INT, R: INT) , 
R: STRUCT(L: INT, R: INT) ) , 



It should be noted that the ELI treatment of mode definition is quite differ- 
ent from that found in other programming languages, such as Algol 68. Tra- 
ditionally, mode definition has been a static operation carried out at compile 
time. By treating the mode-defining operators as executable routines which 
evaluate their arguments, ELI obtains a more flexible and more powerful means 
of mode creation. The most important single consequence is the notion of 
programmer-defined, mode-valued routines. Consider, for example, convert- 
ing the above binary tree generator into a routine. 

TREE *- EXPR(I: INT BYVAL, M: MODE BYVAL; MODE) 
BEGIN 

L: (I-I-l) < =£> M; 

M «- STRUCT(LEFT: M, RIGHT: M) ; 
GOTO L 
END 
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TREE takes the depth (I) and leaf mode (M) as arguments —both passed by 
value. The loop is the same as before, except that now it generates the sequence 
depending on the value of the leaf mode. Hence, TREE(I, M) is the mode binary 
tree of depth I and leaves of mode M, for any integer I and mode M. 

A second example may be of use. Consider defining the mode multi- 
dimensional array of order K of Ms. (For illustrative purposes, we use a 
recursive definition; an iterative one would, in fact, do just as well.) 

MULTI_ARRAY - EXPR(K:INT, M : MODE; MODE) 

BEGIN 

K=0 => M; 

ARRAY(MULTI_ARRAY(K-1, M) ) 
END 

MULTI_ARRAY of K M's is either M (if K is 0) or is an ARRAY of the result 

obtained by applying MULTI_ARRAY to K-l and M. The definition is obvious 

and would be somewhat uninteresting were it the only one possible. However, 

there are other ways of constructing multidimensional arrays which, for some 

purposes, are far superior to the one given above. If, for example, a frequent 

operation is exchanging entire rows, then it will be advantageous to use an 

array of pointers to the constituent rows. The generalization of this to order K 

is defined 

P_ARRAY - EXPR(K: INT, M: MODE; MODE) 
BEGIN 

K=0 => M ; 

K=l => ARRAY(M); 

ARRAY(PTR (P_ARRAY (K- 1, M) ) ) 
END 

The K=0 case should be obvious; for K=l we define a conventional array; for 
higher K's we construct an ARRAY of PTRs to the result of P_ARRAY applied 
to K-l and M. 

The point of programmer-defined mode routines is that they permit signifi- 
cant functional abstraction. Instead of talking loosely about some collection of 
related modes, one can define a collection precisely by means of a routine 
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which generates it. Mode sets such as matrices, binary trees, lists, rings, 
hash tables, etc., of various element types can be defined by their generators. 
This permits the creation of mode-definition libraries. More im- 
portant, it allows the programming of algorithms which act on a class such as 
binary trees without regard to the constituent elements. Only during compi- 
lation is it necessary to freeze free variables to determine which specific sort 
of binary tree. Finally, it allows one to prove properties of mode sets inde- 
pendent of their particular elements by appealing to the properties of the mode 
set generating routine. 

It must be stressed that the considerable generality provided by dynamic 
execution of mode definitions does not exact a price in inefficient code. If any- 
thing, the facility allows for far better code generation. It was explained in 
section 2 that in compiling a routine Rl, one could freeze the values of free 
variables to their then current values. Suppose we execute 
M - P_ARRAY( t £ 1 , £ 2 ) 

M then assumes some definite mode value. If Rl uses M as a type declarer, 
leaving M as a free variable, then Rl may be compiled with M frozen. The 
specific mode value of M will be used. It is therefore possible and practical for 
a program to compute the modes it uses and compile parts of itself specific to 
these computations. 

Section 6: Generic Routines 

The main reason for having united modes (e.g., ANY and those generated 
by ONEOF) is to type formal parameters for routines which accept several 
distinct types of arguments. Such routines (e.g., the operator + in Algol 60) do 
not convert these arguments to fixed types but rather perform different actions 
dependent on argument types. Such routines are termed generic . Almost all 
languages have such routines, but almost always as built-in operations. In this 
section we discuss how the programmer defines his own generic routines in ELI. 
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The basics have already been discussed: one needs the ability to declare 
formal parameters having united modes and a means of testing the actual modes 
of these parameters. The latter is provided by the primitive routine MD; for 
any expression^, MD(i) is the mode of the value of £ . To illustrate a possible, 
if far from satisfactory technique, consider defining + to act on INTs, REALs, 
and COMPLEXs. 

SCALAR - ONEOF(INT, REAL, COMPLEX); 

+ - EXPR(X:SCALAR, YrSCALAR; SCALAR) 
BEGIN 

(MD(X)=INT) AND (MD(Y)=INT) => FIXADD(X, Y); 
(MD(X)=COMPLEX) AND (MD(Y)=COMPLEX) => 

CONST(COMPLEX OF X.RE+Y.RE, X.IM+Y.IM); 
(MD(X)=INT) AND (MD(Y)=REAL) => 

FLOATADD(FLOAT(X),Y); 
etc. 
END 

The + routine is here declared to take two arguments — each of which may be 
one of {INT, REAL, COMPLEX}. The routine body tests the types of its argu- 
ments on each invocation and dispatches to the appropriate code section. 

This has two principal defects: (1) The type testings and their conjunctions 
are redundant and hence tedious to read and write. (2) It is difficult for the com- 
piler to exploit knowledge it may have concerning the modes of arguments. For 
example, if A has been declared to be an INT, then A+3.2 will invoke the third 
alternative, but how is the compiler to know this? It could, of course, make 
the deduction by "interpreting 11 the + routine. While this will work in principle, 
it seems an unnecessarily difficult approach. Instead, we impose additional 
structure on the program — structure which simultaneously makes the code more 
readable by man and more comprehensible by the compiler. 

The traditional means for imposing structure in a programming language is 
with a new syntactic form, here the GENERIC form. A GENERIC form' is 



'A GENERIC form may appear anywhere within a routine; the left-hand arm of 
each statement is always compared with the arguments to the routine itself. 
This is useful for routines which take one or more generic arguments, but 



20 

delimited by the brackets "GENERIC" and "END 11 and contains a set of 
conditional-like statements whose left arm is a set of modes and whose right 
arm is an alternative value of the generic form. For example, the above + 
routine may be directly recoded using a GENERIC form as its body 

+ - EXPR(X:SCALAR, YrSCALAR; SCALAR) 
GENERIC 

[INT, INT] => FIXADD(X,Y); 

[COMPLEX, COMPLEX] =£> CONST(COMPLEX OF X.RE+ 

Y.RE, X.IM+Y.IM); 
[INT, REAL] => FLOATADD(FLOAT(X), Y); 
etc. 
END 

This may be read as an analogue to a set of conditionals: 

if the 1 st arg is an INT and the 2 nd arg is an INT then FIXADD(X, Y); 

else if the 1 st arg is a COMPLEX and the 2 nd arg is a COMPLEX then 
construct a COMPLEX of X.RE+Y.RE and X.IM+Y.IM; 

else if . . . then . . . 



The alternatives are considered in turn until one is found which matches the 
actual modes of the arguments on this invocation of +. The last statement can 
optionally be of the form 

ELSE (expression) 
If this is present and none of the alternative sets match, the last (expression) 
is taken as the value of the GENERIC; if this is not present and there is no 
match, a system error routine is called. The importance of the GENERIC to 
compilation is, of course, that "considering the alternatives 11 can usually be 
carried out during compilation so that compiled calls on + can usually be re- 
placed by a call on the right-hand side of the appropriate alternative statement. 



contain substantial computation £ which does not depend on the modes of these 
arguments (e.g., computation based on the non-united arguments). Such 
routines can be written with an embedded GENERIC form. The code for £ is 
then effectively shared among the various generic alternatives. 
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It should be obvious that if there are several generic formals, the number 
of possible combinations can grow to unwieldy size. Even with a concise 
notation for expressing alternatives, this is unacceptable. It need not, however, 
arise. An element of the mode set of a GENERIC statement can be an arbitrary 
syntactic form, so long as the value it produces is a mode. Hence, a GENERIC 
statement such as 

[CHAR, ONEOF(INT, STRING), ANY] => . . . 
will cover (i.e., match against) each of the following sets of argument modes 

{CHAR, INT, REAL}, {CHAR, STRING, INT}, {CHAR, STRING, MUMBLE}, 
In general, a mode G in a mode set covers an argument mode A if any of the 
following hold: 

(1) G = ANY 

(2) G is a generic mode ONEOF(t ... t ) and A = t. for some i 

(3) G = A 

Even collections of modes will, in some cases, prove too restrictive in 
performing generic selection. Consider, for example, a print routine which 
takes a single argument X — an object to be printed 

EXPR(X:ANY; NONE) . . . 
X can be of ANY mode and no result is returned. The routine is to print X in 
one of three formats, depending on whether X is a structure , an array , or a 
pointer . Testing this is straightforward: ELI includes a primitive routine 
STRUCTP which is a predicate true of structures only. Hence 

STRUCTP(X) => . . . 
is the test which checks for the first print class. The trick is to make the dis- 
crimination while compiling a call on the print routine. 

This can be done if the left-hand side of a generic statement is general- 
ized to 
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where the ^s are forms which evaluate to modes to be compared to the argu- 
ment modes and 0* is an arbitrary form producing a Boolean value. An 
alternative is chosen only if all modes match and 0* is true J For example, 
the desired print routine has the structure 

EXPR(X:ANY; NONE) 
GENERIC 

STRUCTP(X) => . . . ; 

ARRAYP(X) => . . . ; 

ELSE . . . 
END 

Any predicate whatever can be used as part of a generic alternative. This 
provides a very general mechanism for the programmer to control compilation, 
i.e., to perform once at compile time a choice which would otherwise be made 
repeatedly during execution. Hence, the generalized GENERIC form is quite 
powerful. The facility it provides is related to the freezing of free variables 
during compilation. The difference is this. Freezing allows one to produce 
individual compilations of a routine, each tailored to some specific environ- 
ment; the GENERIC form allows a single routine to take several alternative 
actions, yet allows choice among the alternatives to be made when compiling 
a call on the routine. 

The compiler gives special treatment to GENERICS under two circum- 
stances: (1) in compiling a routine whose body is a single GENERIC form, 
and (2) in compiling another routine which contains one or more calls on 
routines of type 1. We consider these in turn. 

When given a routine such as + whose body is a single GENERIC form, 
the compiler produces: 



' 0* may be absent, in which case it is taken as TRUE. Similarly, any of the 
*J?s can be absent, in which case they are taken as ANY. Note that the mode set 
is semantically unnecessary since all mode checks could be carried out in the 
predicate. However, factoring the selection into two parts — a simple pattern 
match and an arbitrary predicate — is useful for pragmatic reasons. 
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(1) a main body for the routine, 

(2) a set of alternative bodies — one for each alternative GENERIC statement. 
The main body consists of 

(1.1) a flag indicating that this is the main body of a GENERIC, 

(1.2) executable code, 

(1.3) a table of alternative mode / predicate sets , 

(1.4) an array of pointers to the alternative bodies . 

The main body has the original formal mode set, e.g., for the -f routine this is 
(SCALAR, SCALAR; SCALAR). It can be called directly (e.g., from interpreted 
code), in which case it tests the alternative mode/predicate sets against the 
arguments and dispatches to the appropriate alternative body. That is, calling 
the main body simply invokes type testing during execution. The alternative 
bodies are themselves complete code blocks' which can be called directly. 

Consider next the actions of the compiler on another routine, say FOO, 
which contains a call on a generic routine (e.g., the + routine in X+FUM(Z) ). 
Since the + body is flagged as being GENERIC, an attempt is made to discover 
which alternative would be chosen were the decision deferred until run-time. 
There are two possibilities: (1) Some alternative is chosen, say the i , in 
which case the compiler generates a call directly on the i body. (2) The 



^Mode information derived from the generic alternatives can be used to con- 
siderable advantage in compiling these bodies. For example, consider 

EXPR(X:ANY, Y:ONEOF(BOOL, CHAR, STRING); REAL) 
GENERIC 

[INT, BOOL] => r 1 ; 

[ONEOF(REAL,CHAR), STRING] => JF . 

END 

The first alternative, 3^, has the formal modes (ENT, BOOL; REAL) and is com- 
piled under the assumption that X is an INT and Y is a BOOL. Similarly, $2 has 
the formal modes (ONEOF(REAL, CHAR), STRING; REAL) and is compiled accord- 
ingly. It may be that fFj and $2 are textually identical, yet the different assump- 
tions of argument modes will lead to different compiled code, each block being 
tailored to its formal modes. 
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compiler discovers that it cannot make a choice, in which case it generates a 
call on the main body. Which case applies is determined by the modes of the 
arguments and the alternative mode/predicate sets in the GENERIC. 

Consider first the modes of the arguments. When the + routine is called, 
its two arguments will have some definite (i.e., non-generic) mode. However, 
the compiler has access only to declarative information and from this must 
deduce what we shall term compilation modes . In some cases, these will be 
less precise than the actual argument mode. For example, a formal parame- 
ter which is ONEOF(INT, REAL) has a united compilation mode. Similarly, 
a block such as 

BEGIN P(X) =S> "IN"; 0. END 
returns either a SYMBOL or a REAL and hence has compilation mode 
ONEOF(SYMBOL, REAL). 

Since the compilation mode of an argument may be united, the generic 
selection mechanism must in general be prepared to take a united argument 
mode. Hence, the definition of covers given above must be expanded to include 
this case. In general, a mode G in a mode set covers an argument mode A if 
any of the following hold: 

(1) G = ANY 

(2) G is a generic mode ONEOF(t 1 ... t ) and A=t. for some i 

(3) G = A 

(4) G and A are both generic and each alternative of A is an alternative of 
G (i.e., G D A). 

The fourth clause raises the possibility of the compiler deciding that it cannot 
make a compile-time choice. If G is ONEOFft- ... t ) and A is ONEOFff- ...t ), 
it may be that G does not cover A but G covers one or more of the t 's. We say 
that G partially covers A. If the actual mode is one of the t's in G, then G will 
cover the actual mode and the alternative may be chosen, otherwise it will not. 
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The compiler cannot tell which will be the case and hence must postpone generic 

selection until execution of the function call. 

Given a set of compilation modes (.•#. ... *Jt ) for the arguments and a table 

of alternative mode/predicate sets each of the form [O 7 .,. ... O 7 . ]^., choosing the 

appropriate GENERIC alternative proceeds as follows. The alternative sets are 

considered in turn, starting with the first. If each formal mode JF. . covers ^. 

j j 

and if 0*. is evaluatable and true, then the i alternative is chosen. If for 
1 

some j, JF.. partially covers ^J(. 3 or if 0*. is not evaluatable, then generic 
selection cannot be made during compilation. Otherwise, the next alternative 
set is considered. 

A key point here is determining whether a predicate is evaluatable. This is 
handled by a variation of the technique for free variable freezing discussed in 
section 2. The compiler treats specially those primitive routines whose value 
might be known (e.g., STRUCTP, LENGTH, MD). Each such routine has a set 
of enabling conditions which depend only on data known during compilation. 
When these are satisfied, values can be calculated and substituted into the 
computation tree of the predicate. Whenever a routine and its arguments in the 
computation tree of the predicate are thus fixed, the routine is applied, propa- 
gating the values upward. The compiler applies this process wherever possible 
until all possible upward propagation has been carried out. If the entire compu- 
tation tree of the predicate collapses into a single value, then the predicate is 
evaluatable and its value is known; otherwise, the predicate is not evaluatable. 

Section 7: Type Conversion 

The most important single point concerning type conversion is that it is 
different from the generic mechanism. The two concepts are, in fact, almost 
orthogonal. Both are mechanisms which allow a routine to be called with argu- 
ments belonging to a set of possible modes, but here the similarity stops. With 
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the generic mechanism, the routine has a corresponding set of possible pararre- 
ter modes. With the type conversion mechanism, the routine has a single 
parameter mode and values of different modes are converted to that mode. 

Traditionally, the choice of conversion routines to be used is fixed for all 
time by the language designer. Even in traditional languages, there is little 
reason for this early freezing. Where it is possible for the programmer to 
define new types, it becomes essential that he be permitted to specify the as- 
sociated conversions. 

Hence, the treatment of type conversion in ELI is designed to satisfy two 
goals: (1) smoothly meshing type conversion with the generic mechanism and 
(2) allowing the programmer to specify what the type conversion will be. In out- 
line, the technique used in ELI is as follows. First a test is made to see if the 
formal mode covers the argument mode (i.e., either the modes are equal or the 
formal is generic and one of its alternatives is equal to the argument mode). 
Failing this, the argument is converted to a value belonging to the formal mode 
using a type conversion routine associated with the argument mode. 

For example, suppose that the + routine has been defined on pairs of 
SCALARS (defined as ONEOF(COMP LEX, REAL, INT) ) and that + is called with 
a STRING-valued argument A. In computation tree terms, we have the direct 
tree for the computation, say 3+A 

+ 



./\. 



Since the formal mode of the second parameter (SCALAR) does not cover the 
argument mode (STRING), the direct tree is replaced with the implicit tree 



C(STRING) 
^ SCALAR 
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where C(STRENG) is the conversion routine associated with string. As with all 

conversion routines, this takes two arguments: the object to be converted and 

the desired mode of the converted result. Here, the desired mode is the formal 

mode of the second parameter — SCALAR. 

The association of a conversion routine with a mode is performed by an 

assignment. A mode such as STRING can be treated as a structure having a 

set of components, one of which is named CONVERT. Assignment of a routine 

to the CONVERT field of a mode establishes that routine as the conversion 

function for the mode. For example, a possible conversion routine for STRING 

could be set up by: 

STRING. CONVERT - 

EXPR(X : ANY, FM : MODE; ANY) 
GENERIC 

FM COVERS INT => STRING_TO_INT(X); 

FM COVERS SYMBOL => HASH(X); 

END 

Like most conversion routines, this is a GENERIC. The first alternative 
consists of a single predicate which uses the infix operator COVERS. This 
is true if its left-hand operand is a mode which covers its right-hand oper- 
and. Hence, the first GENERIC is chosen whenever the desired mode 
includes INT. This is the case in the above call on +, so STRING_TO_INT 
is applied to A. The result, an integer, is taken as the actual argument to 
the + routine. As with other generics, the significant point is choosing the 
appropriate alternative of STRING. CONVERT when compiling the call on +. 
We address this pragmatic question after examining a few semantic issues. 

In so doing, a summary of the discussion thus far may be of use. 
During evaluation, the interpreter may have in hand some actual value of $& 
of mode *J{* and some desired formal mode Jt^ such that U0L does not 
cover *Jf* . The programmer can arrange that in all such circumstances 
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stf be converted by applying a routine F by associating with ^. a GENERIC 
conversion routine having the form 

EXPR(X:ANY, FM:MODE; ANY) 
GENERIC 

FM COVERS ^„ =£• F(X); 

r 

END 
This allows point-to-point conversion between any actual value and any target 
mode (which can, if desired, be generic). 

One could apply this schema to all ( source, destination) pairs for which 
conversion was desired. However, even using generic destination modes to cut 
down on the number of distinct destinations, the number of pairs could be un- 
desirably large. Further, when defining a new mode *J?a, it would be necessary 
to explicitly define each desired conversion to an existing mode. Again, the 
number of such conversions may be undesirably large. 

In the ELI framework, such an exhaustive enumeration of point-to-point 
conversions can be avoided by judicious use of functional composition. Consider, 
for example, the conversion from a CHAR argument to a REAL formal mode. 
The conversion CHAR -*■ REAL is almost surely the composition of CHAR -* INT 
and INT - REAL. Similarly, COMPLEX - INT is surely COMPLEX - REAL 
followed by REAL -*> INT. This suggests an analogous functional composition of 
conversion routines. For example, the conversion routine for CHAR might be 

written 

EXPR(X:ANY, FM:MODE; ANY) 
GENERIC 

FM COVERS STRING => CONST(STRING OF X); 

FM COVERS INT => CHAR_CODE(X); 

(FM COVERS REAL) OR (FM COVERS COMPLEX) => 
INT.CONVERT(CHAR_CODE(X), FM); 

END 
The third alternative tests whether either REAL or COMPLEX is acceptable; 
if so, X is converted to an INT and the INT conversion routine is called to 
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complete the work. The test in this case is used to insure that the process 
does not run into a dead end. 

In general, specifying conversion by composition allows the programmer to 
factor the conversion bush stemming from a data type. This does not address 
the question of what the paths should be when there is more than one path logi- 
cally possible. An answer to the question can only come from a knowledge of 
what the data types represent; i.e., a decision must be based on the specific 
application. The point of the factoring scheme is to provide a concise notation 
for expressing the desired paths, once a decision has been made. 

Thus far the discussion has centered around a hypothetical evaluator with 
an actual argument requiring conversion in hand. That is, we have described 
the actions of the interpreter and neglected the compiler. In explaining the 
semantics of type conversion, this benign neglect is actually quite appropriate. 
One general rule of ELI semantics is that the evaluator model dictates actions 
in complex circumstances; the compiler is constrained to produce code that does 
the same thing. Applying the general rule to type conversion results in the follow- 
ing dictum: the compiler must generate code for type conversions which has 
results identical to those which would have been obtained using the interpreter. 

To take a concrete example, consider a function call 

FOO(MUMBLE(X) ) 
where the formal parameter of FOO has mode *Jf^ and the formal result type of 
the MUMBLE routine is ^ R - There are three cases: 

(1) ^ F covers *Jf R 

(2) ^ F does not cover ^ R and ^ R is non-generic 

(3) ^1-, does not cover ^ R and ^ R is generic. 
In case 1, no conversion is required. 

In case 2, the function call is treated as if the program had read 



FOO(^ R .CONVERT(MUMBLE(X), ^ p ) ) 
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Most likely ^ R . CONVERT is GENERIC so that the compile-time GENERIC 

selection mechanism is invoked to choose the appropriate alternative. There 

is one new point here: one or more of the predicates may contain expressions 

of the form 

FM COVERS J(. 
1 

where FM is the second formal parameter — the desired mode. To make a 
compile-time selection here, the compiler must handle generic conversions 
somewhat specially and recognize that in such cases it knows the value of FM 
so that the predicate may be evaluatable. This is related to a point mentioned 
earlier: the compiler must recognize that it may know the value of functions 
such as STRUCTP and LENGTH appearing in predicates of other GENERICS. 
The compiler has, of course, considerable specific knowledge of this sort. 
Once an alternative code body is chosen, compiling the call is straightforward. 
Here, as with any other call, there is a choice to be made between generating 
code to call the existing code block and generating an in-line expansion. The 
choice is best made by the compiler, based on the size of the code to be copied 
and the setting of parameters which control the space/time trade-off. 

Case 3 is somewhat subtle. It is known that the actual result mode, «^ A , 
will be one of the alternatives of «^ R - Either (I) ^ F covers Jt*, in which case 
no type conversion is required, or (II) it does not, in which case the conversion 
routine *Jt* .CONVERT must be invoked. However, the compiler has no way of 
determining what *J? A will be. Hence, it generates code which tests the mode 
of MUMBLE(X) at run time and invokes the conversion routine for that mode if 
II holds. The use of run time type testing here is vital. It would not be accept- 
able to interpret case 3 as FOO(^ R .CONVERT(MUMBLE(X) ) ). The generic 
mode «^L could call for conversions quite different from those invoked by the 
actual result mode; compiled code could then produce quite different results 
from the same program run interpretively. 
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The case of a generic compilation mode ^L which is not covered by a 
formal mode ^_ is not confined to the values of routines. Similar situations 
are produced by blocks, variables, etc. — wherever the compiler cannot com- 
pletely determine the data type of a construct. In the important case of blocks, 
the different data types frequently arise from different block exits. Hence, the 
compiler can distribute the type conversion in space so as to use the efficient 
case 2 treatment. For example, suppose that FOO is called with the argument 

BEGIN 

P(X) => Y; 

ELSE J 
END 

where MD(Y)=REAL, MD(J)=INT, so that the compilation mode ^L of the block 

x\ 

is ONEOF(REAL, INT). Suppose ^L, does not cover this. A commonly used tech- 
nique in language design is to "widen" the result of the block to REAL and com- 
pile in code for REAL to ^L conversion. This is poor design, since the technique 
can result in unnecessary conversion steps. The solution used in the ELI com- 
piler is to treat each statement that can lead out of a block independently. Hence, 
the block is compiled as if it had been 

BEGIN 

P(X) => REAL.CONVERT(Y, uT_); 
ELSE INT.CONVERT(J, jjf ) F 

END F 

This is better, in both time and space, than forcing an artificial widening. It is 
particularly attractive when ^U covers either REAL or INT so that the appro- 
priate conversion routine is omitted. 

Section 8: Programmer Specified Mode Behavior 

The treatment of mode definition given in section 4 centered on the con- 
struction of modes. That is, a mode WALDO can be defined to be a set of 
objects having fields A, B, C of types t., t„, t~, etc. Such a syntactic specifi- 
cation is, however, only one aspect of mode definition. Indeed, at a sufficiently 
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high level of abstraction, ' a level which is often only implicit in programming, 
the syntactic specification is irrelevant; what is of interest is how an object 
behaves . In this view of programming, a syntactic mode specification is a 
lower level concept which serves to implement some higher level set of be- 
havioral laws. The definition mechanism of section 4 is then a necessary pre- 
requisite, but only as a basis on which to build a sophisticated mode definition 
mechanism. Given that this is the direction to be taken in providing a truly 
problem-oriented language, the issue is what constitutes a higher level mode 
definition and how to state such a definition in a convenient way. 

To some extent, section 6 has shown the approach to be taken. Consider 
two modes Ml and M2, defined 

Ml - SI::ARRAY(16,BOOL); 

M2 - MANT ::ARRAY( 16, BOOL); 

This introduces a sixth primitive mode constructor denoted by the infix operator 

": :". This takes a variable name as its left-hand operand and a mode as its 

right-hand operand and constructs a new labeled mode distinct from all modes 

which may be structurally similar but have different or no labels. In the above 

example, SI and MANT serve as labels for their respective modes, so that the 

two modes are not equal and neither is equal to ARRAY(16, BOOL). Since Ml 

and M2 are different modes, they can be assigned different conversion routines, 

say Cl and C2, respectively. Suppose XI and X2 are variables of modes Ml and 

M2, respectively. Structurally, they are identical. However, if used in a 

position where conversion is required (e.g., 3+X1 or 3+X2), they may act quite 



differently. To take a simple example, SI: : ARRAY(16, BOOL) may represent 

15 
a signed integer whose magnitude is less than 2 using 16 bits in two's comple 

ment notation, while MANT: : ARRAY(16, BOOL) may represent a real number 



'Such layers of abstraction are directly related to the strata of Dijkstra's 
structured programming [9] . We pursue this point later. 
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between 1 and 0. These facts about representation are stored in the routines 
that handle conversion from these modes to other modes (here, SCALAR). The 
algorithm which uses XI and X2 itself displays none of these representational 
issues. It performs the abstract operation of addition and the data type defi- 
nitions of XI and X2 determine the rest. It should be noted that two mechanisms 
are employed — the implicit type conversion for the arguments to the + routine 
and the GENERIC mechanism in the + routine itself. For the purpose of this 
discussion, the former is the more important since it is more global in scope — 
the conversion routine for Ml will be applied in any situation where a value of 
mode Ml causes a type fault. 

This separation of abstract process-oriented algorithm from detailed mode- 
dependent manipulations is clearly a step in the right direction. To push this 
further, we need only find other "global 11 situations in which mode-specific 
manipulations should be called into play. Two' others have been chosen for 
consideration in ELI — selection and assignment. To pursue the above example, 
programmer control over the meaning of assignment would allow one to specify 

that 

XI - -34.2 

is to cause the real value to be converted to an integer and that value packed 
into 16 bits. 

A second example may be useful to illustrate the power of this technique. 
For debugging and other purposes, it is frequently useful to be able to monitor 
the value of a variable and take some special action (e.g., output of an error 
message) under certain abnormal conditions (e.g., when its value exceeds the value 
of another variable). Let X be such a variable of mode M, let P(X) be a predi- 
cate which tests for abnormal conditions, and let A(X) be the action to be taken. 



'It would be easy to justify additional ones, such as generation or storage 
reclamation. Some experience will perhaps be required before a completely 
satisfactory set is found. 
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One can define a new mode M 1 in terms of M, P, and A as follows: 

(1) M's are structurally identical to Ms. 

(2) Whenever a formal mode M is required and an actual value of mode M' 
is in hand, the M' is treated as if it were an M. 

(3) Whenever an object of mode M' is assigned a new value X, the predi- 
cate P(X) is evaluated; if the result is true , A(X) is executed. 

One can go further and automate this process by defining a routine 
SENSITIVE_MODE (as a function of one mode M and two routines P and A) that 
constructs the new mode M 1 . Having written this one routine, the programmer 
has at his disposal the notion of "sensitive object" for any mode M. Redeclaring 
any variable to be a SENSITIVE_MODE(M, P, A) inserts the monitoring probes 
with no other changes to the program required. 

To provide some substance to the discussion, we turn to a third example 
which we treat in detail. Consider defining the mode "ring buffer of characters * 
If X is such a buffer, its chief characteristics are: 

(1) An assignment of a CHAR value to X pushes the character onto the back 
end of the buffer if there is room, else error. 

(2) Use of X where formal mode CHAR is desired pops a character from 
the front end of the buffer if the buffer is non-empty, else error. 

Very likely, other properties would be desirable: 

(3) The buffer can be treated as if it maintained at all times a correct count 
of the number of characters it holds and X. COUNT accesses this com- 
ponent. 

(4) The top element of the buffer can be inspected without popping it by 
selecting the TOP element, i.e., X.TOP. 

A possible structure for such buffers is 

STRUCT(FRONT:INT, BACK:INT, BODY: ARRAY(K, CHAR) ); 
where K is some constant — the maximum number of characters the buffer is 
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to hold. (To simplify the discussion, we suppose a given value for K, say 200.) 
FRONT and BACK will be indices to the front and back ends of the buffer, with 
the convention that characters go in the back end and out the front end. Hence, 
FRONT chases BACK backward around the ring with modulo K arithmetic. 

To establish the desired behavior, it will be necessary to use rather special 
assignment, selection, and conversion functions. We do not want these applied 
to arbitrary objects that happen to have the above structure; hence, in actually 
defining the desired mode, we use a label to create a unique mode 

RBUFF - RB: :STRUCT(FRONT:INT, BACK:INT, BODY:ARRAY(K, CHAR)); 
RBUFF is a mode-valued variable. Its value is defined (by the assignment) to be 
RB: :STRUCT(FRONT:INT,BACK:INT,BODY:ARRAY(200, CHAR)) which differs 
from other modes having identical structure but different (or no) label. 

The desired behavior of RBUFF is established by assignment to the 
SELECT, ASSIGN, and CONVERT fields of this mode. We consider these in 
turn. We have established that if X is an RBUFF, there are to be two and only 
two "fields 11 which may be selected: 

X. COUNT which gives the number of items in the buffer 
X.TOP which gives the top item of the buffer. 

That these fields do not actually exist as such is irrelevant, so long as the 
mode definition creates the desired behavior (illusion if you will). Further, 
when using an RBUFF as an RBUFF, there is no need to directly access the 
fields FRONT, BACK, and BODY. The job of the selection routine is to define 
the desired fields "COUNT" and "TOP" in terms of the fields which actually 
exist and simultaneously render these latter fields unavailable to direct access. 
The language evaluator provides a triggering mechanism for this definition. It 
calls the selection routine ' of RBUFF on any selection of the form 

X.( fieldname) 



'If no explicit definition is made, a system-generated selection function is used. 
It is this that establishes the "normal" meaning of selection for a defined mode. 
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where X is of mode RBUFF. It will be passed two arguments — the object being 

selected from and the name of the field represented as a symbol. Consider 

RBUFF. SELECT - 

EXPR(X:RBUFF, FDrSYMBOL; INT) 
GENERIC 
FD="TOP" => BEGIN 

UR(X).FRONT*UR(X).BACK => 

UR(X).BODY[UR(X). FRONT] 
ELSE BUFF_EMPTY(X); 
END 
FD="COUNT" => BEGIN 

DECL F:INT BYVAL UR(X). FRONT; 
DECL B:INT BYVAL UR(X).BACK; 
F>B => F-B; 
F<B =5> 200-B+F; 
ELSE 
END 
ELSE SELECTION_FAULT(RBUFF, FD) 
END 

The routine tests the field name by comparing it to the symbol-valued constants 
"TOP" and "COUNT.' Based on this comparison, the main conditional discrimi- 
nates between three main cases: (1) the (fieldname) is TOP, (2) the (fieldname) 
is COUNT, and (3) neither of the above. The last case is treated as an error and 
a system error routine is called. Consider the case FD="TOP. " We adopt the 
convention that FRONT is the index of the first good element to be emptied on 
output and BACK is the index of the next element to be filled on input; hence the 
buffer is non-empty whenever FRONT^BACK. However, it is not possible to 
make the required test by writing 

X.FRONT*X.BACK 
Since X is an RBUFF, this would invoke the selection routine for RBUFF recur- 
sively. What we need is the selection routine not for RBUFF but rather for the 
underlying representation STRUCT(FRONT:INT, BACKtINT, BODY:ARRAY(200, 
CHAR)). The primitive routine UR maps X onto an object having the same pattern 
of bit values but a different mode, the mode of the underlying representation . In 
fact, no copying need be done: X and UR(X) refer to the same object, they just 
ascribe different modes to this object. With this explanation of UR, the rest of 
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the code should be fairly clear: if the buffer is not empty, its top element is 
selected. As to the case F="COUNT", the block uses two local variables simply 
to avoid writing UR(X). FRONT and UR(X).BACK repeatedly; the number of char- 
acters is calculated in the obvious way. 

Consider next the conversion routine for RBUFF. Assuming for the sake of 
simplicity that the only conversion to be considered is RBUFF -*■ CHAR, we obtain 

RBUFF. CONVERT - 

EXPR(X:RBUFF, FM:MODE; CHAR) 
BEGrN 

DECL F:INT BYVAL UR(X). FRONT; 

DECL TEMP:CHAR; 

FM*CHAR => TYPE_FAULT(RBUFF,FM); 

F = UR(X).BACK => BUFF_EMPTY(X); 

TEMP *- UR(X).BODY[F]; 

F *- UR(X). FRONT <- F-l; 

F=0 - UR(X).FRONT«-200; 

TEMP 
END 

This creates a local variable F initialized to the value of UR(X). FRONT and a 

TEMP of mode CHAR to hold the result of the routine. The next two lines test 

that the desired mode is CHAR and that the buffer is not empty. Then TEMP is 

assigned the top element and UR(X). FRONT is decremented. If the new value is 

zero, UR(X). FRONT is wrapped around the buffer. Finally, the block (and hence 

routine) returns TEMP as its value. 

The assignment routine is similar and should be self-explanatory: 

RBUFF. ASSIGN «- 

EXPR(X:RBUFF, Y:CHAR; CHAR) 
BEGIN 

UR(X).BODY[UR(X).BACK] - Y; 

UR(X).BACK «- UR(X).BACK-1; 

UR(X).BACK=0 -* UR(X).BACK«-200; 

UR(X).BACK*UR(X). FRONT =3> Y; 

ELSE BUFF_OVERFLOW(X) 
END 

One point should be noted. Suppose Bl and B2 are both RBUFFs and consider 

the assignment 

Bl «- B2; 

Since the left-hand operand is an RBUFF, the RBUFF assignment routine is 
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called. Binding its formals to its arguments proceeds as follows. The formal 
X is an RBUFF so this is bound directly to Bl. However, the formal Y is a 
CHAR and the argument B2 is an RBUFF. Hence, the conversion routine for 
RBUFFs is called with arguments B2 and CHAR. The result is to pop an ele- 
ment from B2 and bind the formal Y to this. Hence, the assignment causes an 
element to be popped from the front of B2 and added to the back of Bl. 

Turning from the specific to the general, several points should be noted. 
(1) It is straightforward to treat the buffer size K and the mode M of its con- 
stituent elements as parameters and write a routine BUFFER(K, M) that produces 
a mode for any values of K and M. Such a BUFFER routine can be viewed as a 
realization of one implementation technique of the concept of buffer. From an- 
other point of view, one can ignore the implementation and take BUFFER as an 
abstract set of data types with certain properties. (2) The notion of underlying 
representation has a natural extension. We have just used the mode 
STRUCT(FRONT:INT, BACK-.INT, BODY:ARRAY(200, CHAR)) as a basis for 
defining RBUFF so that the former is the underlying representation of the latter; 
we could equally well use an RBUFF as a basis for defining a new mode, for 
which RBUFF would be the underlying representation. 

We illustrate this notion with an example. A character stream is frequently 
used to encode a virtual character set greater than that actually available by 
using one or more characters as escape characters whose appearance changes 
the interpretation of characters which follow them. In such cases, a virtual 
character is an array of actual characters. It is then useful to consider a class 
of buffers into which single CHARs can be pushed at the back end but which 
deliver STRING'S at the front end. Let this mode be called STRBUF. One could 
in principle define STRBUF in terms of the underlying representation 
STRUCT(FRONT:INT, BACK:INT, BODY:ARRAY(200, CHAR)). However, it is 
far more convenient to use RBUFF as the underlying representation. Then 
STRBUF can be defined in terms of COUNT, TOP, and the operations of 
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assigning to and converting from RBUFFs. In this definition, if X is an STRBUF, 
then UR(X) is to be interpreted as an RBUFF. In this fashion, any existing data 
type can be used as the basis for defining a new type. 

The syntax for carrying this out is simple since the labeling of modes can 
be cascaded. If UJ is a mode, and jV is an identifier, then 

is a new mode with label ^¥ based on the mode *Ji . For example, 

WALDO: :RBUFF 
defines a new mode equal to WALDO::RB::STRUCT(FRONT:INT, BACK:INT, 
BODY:ARRAY(200, CHAR)). If X is an instance of this mode, UR(X) is an RBUFF. 
This new mode can itself be used as the second argument to the :: operator 
to build up a hierarchy of mode definitions. Such a definition scheme has a 
number of consequences closely related to Dijkstra's structured programming. 
In writing a program in the style of structured programming, one builds a 
n string of pearls. Each pearl has its own set of abstract operations and data 
types defined in terms of lower level pearls. Realizing this for operations is 
straightforward; higher level routines are composed from lower level routines. 
Programmer control over mode behavior, as discussed in this section, provides 
an analogue for data. A mode ^. at one level in the string can be based on one 
or more modes at lower levels. A well-engineered definition set will use the UR 
routine only in the mode specification routines (selection, conversion, and 
assignment). Operations at that level see only the behavior of the defined mode 
^., not its definitions in terms of lower level notions. Hence, the actual repre- 
sentation used to achieve this behavior is irrelevant at this and higher levels. 
Without changing other parts of the program, one can vary this representation 
at will. This affords a very powerful means to (1) modify the program to perform 
related tasks, and (2) optimize performance for a given task. 
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Section 9: Conclusion 

The treatment of data types in ELI rests on nine points: 

(1) modes as values in the language, 

(2) the facility for freezing free variables during compilation, 

(3) the generic interpretation of mode union, 

(4) routines as a unification of operators and procedures, 

(5) inclusion of both generic routines and type conversion, 

(6) programmer-defined generic routines and programmer control over type 
conversions, 

(7) interpreter-based semantics for generic selection and type conversion 
which the compiler is constrained to follow, 

(8) programmer control over mode behavior, 

(9) the notion of underlying representation and the basing of one mode on the 
behavior of another. 

Some of these are independent of one another and of the ELI language; these 
can be applied directly to other languages. Other points depend strongly on the 
language; carrying these over is a bit tricky. However, none of these is par- 
ticularly difficult to implement. 

The most radical points are the first and second. These are also the most 
significant. The inclusion of modes among the legitimate values in a language 
allows modes to be computed , providing a very powerful definitional capability. 
A direct consequence is the concept of programmer-defined, mode-valued 
routines and, hence, the functional abstraction these provide. 

The first point demands the second (or a functional equivalent). Computed 
mode values are of interest only if they can be used as the types of variables in 
compiled routines. Hence, there must be some mechanism to specify that a 
particular non-primitive mode value is to be used as a data type. The specific 
technique used in ELI, the "freeze list," is somewhat immaterial. Other 
equivalent techniques could be used to the same effect. What is important is 
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the concept of evaluability and the upward propagation of computation-tree 
collapse. The utility of this mechanism goes beyond its use in connection with 
data types. It allows the programmer to nail down invariants of all sorts and 
have the program reflect the consequences of these invariants. 

Turning to point 3, we note that there are many possible interpretations of 
"union" as applied to data types. The one used in ELI was chosen on the basis 
of simplicity, implementability, and because it meshes most smoothly with the 
generic routines. This interpretation of union — the generic — treats a united 
mode as the postponement of a commitment until execution. Hence, during exe- 
cution the concept largely disappears, simplifying the semantic description. 

The use of routines as a unification of operators and procedures hardly 
requires comment. Apart from external syntax, there is no real difference 
between the two. A language which allows the programmer to define both should 
surely provide identical semantics for the two. The alternative is harder to 
communicate, learn, and implement. Regrettably, it is the common practice. 

Having both generic routines and type conversion is almost a necessity. 
Neither alone provides the right flexibility; neither is a good substitute for the 
other. The scheme used in ELI may be briefly summarized as: if a formal 
mode cov .he argument mode, binding is direct; otherwise, the argument 

is converted to some mode that is covered. 

Given that a language includes generic routines and type conversion, it 
should follow that these be controllable by the programmer, i.e., that he be 
permitted to define generic routines (in addition to the built-in set) and that he 
be permitted to specify the type conversions. This is not hard to implement. 
Most of the necessary mechanisms are already present to handle the built-in 
definitions. Implementation of programmer control is mostly a matter of em- 
ploying these mechanisms on programmer-supplied definitions. 

The only subtle point is choosing a semantic model. Generic selection and 
type conversion can become complex, since conversions are typically cascaded 
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and often applied to the arguments of generic routines. A compiler model is 
awkward, since it either imposes language restrictions to insure compile- 
time knowledge of modes or, lacking this, produces a description which de- 
pends not on modes but the compiler's knowledge of modes. The use of an 
interpreter (i.e., run-time) model greatly simplifies the descriptive task. 

Giving the programmer control over the behavior of the modes he defines 
is again easy to implement, provided the implementation is done correctly. 
Any system which allows the definition of new data types must construct tables 
or their equivalent to give meaning to subsequent assignments, selections, type 
conversions, etc v using variables of these types. It is a short step and a con- 
siderable improvement from such tables to system-generated routines tailored 
to each new data type. The next step is to allow the programmer to specify the 
routines he wants invoked. His routines must define their operation in terms 
of a machine-independent underlying representation. If this is done correctly, 
the layering of underlying representations falls out naturally. 

In summary, the treatment of data types in ELI is based on a set of fairly 
straightforward notions, most of which are simple to implement. Much of this 
treatment can be carried over to other high level programming languages. The 
linguistic power they add is considerable. 
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